NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Audio-Visual Neural Syntax Acquisition

Lai, Cheng-I Jeff; Shi, Freda; Peng, Puyuan; Kim, Yoon; Gimpel, Kevin; Chang, Shiyu; Chuang, Yung-Sung; Bhati, Saurabhchand; Cox, David; Harwath, David; et al (December 2023, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU))

Full Text Available
Moment Distributionally Robust Tree Structured Prediction

Li, Yeshu; Saeed, Danyal; Zhang, Xinhua; Ziebart, Brian D.; Gimpel, Kevin (December 2022, Advances in neural information processing systems)

Structured prediction of tree-shaped objects is heavily studied under the name of syntactic dependency parsing. Current practice based on maximum likelihood or margin is either agnostic to or inconsistent with the evaluation loss. Risk minimization alleviates the discrepancy between training and test objectives but typically induces a non-convex problem. These approaches adopt explicit regularization to combat overfitting without probabilistic interpretation. We propose a momentbased distributionally robust optimization approach for tree structured prediction, where the worst-case expected loss over a set of distributions within bounded moment divergence from the empirical distribution is minimized. We develop efficient algorithms for arborescences and other variants of trees. We derive Fisher consistency, convergence rates and generalization bounds for our proposed method. We evaluate its empirical effectiveness on dependency parsing benchmarks.
more » « less
Full Text Available
Moment Distributionally Robust Tree Structured Prediction

Li, Yeshu; Saeed, Danyal; Zhang, Xinhua; Ziebart, Brian D; Gimpel, Kevin (December 2022, Advances in neural information processing systems)

Full Text Available
Chess as a Testbed for Language Model State Tracking

https://doi.org/10.1609/aaai.v36i10.21390

Toshniwal, Shubham; Wiseman, Sam; Livescu, Karen; Gimpel, Kevin (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

Transformer language models have made tremendous strides in natural language understanding tasks. However, the complexity of natural language makes it challenging to ascertain how accurately these models are tracking the world state underlying the text. Motivated by this issue, we consider the task of language modeling for the game of chess. Unlike natural language, chess notations describe a simple, constrained, and deterministic domain. Moreover, we observe that the appropriate choice of chess notation allows for directly probing the world state, without requiring any additional probing-related machinery. We find that: (a) With enough training data, transformer language models can learn to track pieces and predict legal moves with high accuracy when trained solely on move sequences. (b) For small training sets providing access to board state information during training can yield significant improvements. (c) The success of transformer language models is dependent on access to the entire game history i.e. “full attention”. Approximating this full attention results in a significant performance drop. We propose this testbed as a benchmark for future work on the development and analysis of transformer language models.
more » « less
Full Text Available
Reconsidering the Past: Optimizing Hidden States in Language Models

https://doi.org/10.18653/v1/2021.findings-emnlp.346

Yoshida, Davis; Gimpel, Kevin (January 2021, Findings of the Association for Computational Linguistics: EMNLP 2021)

Full Text Available
PeTra: A Sparsely Supervised Memory Model for People Tracking

https://doi.org/10.18653/v1/2020.acl-main.481

Toshniwal, Shubham; Ettinger, Allyson; Gimpel, Kevin; Livescu, Karen (July 2020, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics)

We propose PeTra, a memory-augmented neural network designed to track entities in its memory slots. PeTra is trained using sparse annotation from the GAP pronoun resolution dataset and outperforms a prior memory model on the task while using a simpler architecture. We empirically compare key modeling choices, finding that we can simplify several aspects of the design of the memory module while retaining strong performance. To measure the people tracking capability of memory models, we (a) propose a new diagnostic evaluation based on counting the number of unique entities in text, and (b) conduct a small scale human evaluation to compare evidence of people tracking in the memory logs of PeTra relative to a previous approach. PeTra is highly effective in both evaluations, demonstrating its ability to track people in its memory despite being trained with limited annotation.
more » « less
Full Text Available
PeTra: A Sparsely Supervised Memory Model for People Tracking

https://doi.org/10.18653/v1/2020.acl-main.481

Toshniwal, Shubham; Ettinger, Allyson; Gimpel, Kevin; Livescu, Karen (January 2020, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics)

Full Text Available
Learning to Ignore: Long Document Coreference with Bounded Memory Neural Networks

https://doi.org/10.18653/v1/2020.emnlp-main.685

Toshniwal, Shubham; Wiseman, Sam; Ettinger, Allyson; Livescu, Karen; Gimpel, Kevin (January 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP))

Full Text Available
Learning to Ignore: Long Document Coreference with Bounded Memory Neural Networks

Toshniwal, Shubham; Wiseman, Sam; Ettinger, Allyson; Livescu, Karen; Gimpel, Kevin (January 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing)
null (Ed.)
Full Text Available
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Srivastava, Aarohi; Rastogi, Abhinav; Rao, Abhishek; Shoeb, Abu Awal; Abid, Abubakar; Fisch, Adam; Brown, Adam R.; Santoro, Adam; Gupta, Aditya; Garriga-Alonso, Adri; et al (January 2023, Transactions on machine learning research)

Full Text Available

Search for: All records